^KxniasifT BEstrxs 



ED 348 378 
TITLB 

SPONS AGEMCY 



PUB DATE 
NOTE 



PUB TTPE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



TK 018 743 

Kr<»rey, Jeffrey D.j Bacon, Tina p. 

ttil««i?J"^ °f Miction, lall«««„., 

university of South Floriaa, Tam. inat far- 
instructional Bswarcn and i^raoS;. 

Apr 92 

45p.; Paper presented at the Annual Meetino fho 

Francisco, c&, .-^rll 20-24, 1992). 
Reports - Evaluative/Feasibility (142) — 
Spe< ches/conf erence P^rs (150) 

«P01/PC02 Plus Postage. 

•Adiieveaent Tests; COTmarative Anaiv«4o. n.*,. 
Level, •Estimation (iS^tics) Lf,'"'''^^^ 
Kathematical Hodels; Hontf ^r?o Me^JSsf^^"''' 
Psychoaetrics; •Saiwjie Sizer«sL??o?. » 
construction; .TesTlteas "^^^^^^^ical Bias; Test 



small sample staL^ er^'f^"^L!f to estimate the 

statistics co«i^eS irtSfa^vsrs'^ ""t^ °' ^^y^^<^^ric 

statistics exa^nei in this reseSS^w«L°^f?^^®'^™"^ ^'^^^s. The 

difficulty, (2) the index o^ !?!f^? ^»*5eK of item 

iter^total p^iit-SiserLx correl^t^o^^^^ 

coefficient alpha. Sample s^Ss cf 5 ?r 20 !f 

evaluated, one thousand samples of Znt L ' °' 

replacement from each of ^ «r^^?f ? f »ith 

area teats. Thes/f^Lf .epresSt^^ '''^ ^^^'^^^^ ««^^«-t 

are directly calculable I^ffro^wLcf f "'^"^ Parameters 

Of statistics are empirl^iy estl^L 

statistic »as evaluated ^cL^ut^^^^ T behavior of each 
statistic for each sa^lf s?^^f L standard error of the 
computing the stati^S! bias S thf ^.''ff"?'^^^^'^'^^^' by 
and each Pseudo-popuiaSon Reauf^-o ! ^*^*^^«tic for each sample size 
applications to ?^st de^ej^p^^nt ^^^^^^ their 
nine figures illustrate theTs^^of^! ^^^^ ^^^^^"^ ^tudy (Sata, and 
references. (Author/SL^) T»»«re is a 13-itei list of 



..^^ ^^^^^^^^ 



Heproductions supplied ^ hors are the ^b::;--;;: 



vM. avt mtmtn 9f «mic«ticm 

ate* o> rilXl ii wKn wit' O I mwn— m 



•PERWISSJON TO REPJWOUCE THIS 
MATERtAt HAS BEEN GHANTEO BY 




TO THE EDUCATIONAL RESOURCES 
INFORMATtON CEN'IER (ERtC). " 



Xt<»i Analysis 



1 



ItsB Analysis of Achievssent Tests Based 
on Small Numbers of Exaninees 
3ettvB)f D. KroHrey 
University of South Florida 

Tina P. Bacon 
University of South Florida 



This research was supported in part by a grant from the 
Florida Department of Education and the Institute for 
Instructional Research and Practice at the University 
of South Florida. 

RUNNING HEAD: Item Analysis 

Paper presented at the annual conference of American Educational 
Research Association, April 19*24, 1992, San Francisco, CA. 



' BEST COPY MILMI 



Ites Analysis 
2 



Abstract 



A Konte Carlo study was conducted to estisate the snail sasple 
standard errors and statistical bias of psycho»etric statistics 
cowRonly used in the analysis of achievement tests. The 
statistics exaMined in this research were (a) the index of item 
difficulty, (b) the index of item discrimination, (c) the 
corrected item-total point-biserial correlation coefficient, and 
(d) coefficient aljrtia. Sample sizes of 5, 10, 20, 40, 80, and 
160 were evaluated. One thousand samples of each size were drawn 
with replaceaient from each of ten archival data files from 
teacher subject area tests. Results were interpreted in terms of 
applications to test development. 



Item Analysis 
3 

Item Analysis of Achievesent Tests Based 
on Small Numbers of Examinees 

The traditional t«.chniques of item analysis (i.e., the 
calculation of item difficulty indices, item discrimination 
indices, and distractor analyses) may have limited utility when 
the number of examinees on which the calculations are based is 
small. Two statistical issues to consider in the application of 
these techniques to small samples are (a) the magnitudes of the 
standard errors of the statistics, and (b) the potential for 
statistical bias in the estimation of population parameters. 

The purpose of this research was to develop estimates of the 
standard errors and biases of item difficulty, discrimination 
indices, and coefficient alpha when the calculations are based 
on small samples of examinees. 

♦ 

Knowledge of the standard errors of statistics used in the 
analysis of achievement test items is valuable for the 
interpretation of the results of an item analysis. For example, 
the width of a confidence interval around a calculated index of 
discrimination provides information about the expected amoun|; of 
variation in the obtained magnitude of that statistic under 
repeated sampling. 

With the exception of the item difficulty index, the 
statistics used in itrm analysis do not provide easily calculated 
standard errors (Perry & Michael, 1954). Further, asymptotic 

ERIC 4 



Itea Analysis 
4 



foraulas for standard errors that provide useful approxiaations 
for large saxple sizes are frequently inaccurate when applied to 
saall samples. 

The standard errors of item statistics are typically ignored 
in traditional ites analyses, ifith large saaples of examinees, 
the practice of ignoring statistical errors is probably 
acceptable because the magnitudes of the standard errors are 
reasonably small in such circumstances (being inversely related 
to the number of observations on which the statistics are 
calculated), ffith small numbers of examinees, however, the 
practice of ignoring standard errors should be seriously 
questioned. 

Rationale 

The classical true score model for computing item and test 
score indices has served test constructors well for many years. 
The simplicity of the model and the reasonable ease of computing 
indicators of test or item functioning are some of the model's 
advantages. There are, however, some concerns that arise in the 
use of traditional methods for test construction and revision. 

The value of item indices, such as difficulty and 
discrimination, are group-dependent (Hambleton, 1989). The 
computed value of these statistics vary according to the 
attributions, skills, or ability level of the derivation sample. 
A group of examinees possessing greater ability will result in 
higher item difficulty indices than when the statistic is 



Itea Analysis 
5 

computed fro» perforaMince of lofier ability examinees. 

Likewise, discrimination and reliability indices are 
impacted by the group variability. 'She magnitude of the 
correlation coefficient is dependent on the homogeneity or 
heterogeneity of the group, with diverse saiqples having higher 
values than groups having similar ability levels (Lord & Movick, 
1968). 

It is, in part, the natxire of the indices to be group- 
dependent that raises the question of what sample size is 
necessary to provide the test constructor confidence in using the 
indices for test refinement and describing test functioning. 
Many researchers have proposed rules-of-th\amb for determining 
sample size for conducting item analysis. Nunnally (1967) 
suggested that the test constructor have 5 to 10 times as many 
subjects as items. Crocker and Algina (1986) proposed that 
sample sizes of 200 subjects would offer reasonable statistical 
stability. Other researchers and commercial test developers have 
recommended sample sizes ranging from 300 to 3000 depending upon 
tt"^ target population to be served by the instrument (Conrad, 
1948? Henryson, 1971? and Swineford, 1974). 

Little empirical work has been done that directly 
investigated the impact of small sample sizes on standard error 
and statistical bias of traditional test and item indices. In 
fact, the majority of the research has focusckl on sample size and 
sample variability in the application of item-response methods to 



Item Analysis 
6 

test constxruction and equating. 

One study conducted by Nevo (1980) investigated the effects 
of various sasple sizes on the accuracy of rank-'ordering or 
categorizing of traditional itea indices. Nevo's findings 
suggested that sample sizes as low as 100 say be sufficient if 
the researcher's goal is to position items in relation to each 
other. But the question of the absolute value or loss of 
accuracy for individual itea indices estimated froa saall samples 
was not addressed. 

The lack of empirical efforts examining the relationship of 
sample size to standard error and bias in estiamting traditional 
item indices suggests the need for such study. Empirical 
findings aay even provide the opportunity to suggest sample size 
guidelines for different indices and the estimated cost to 
accuracy and utility in making decisions based di values obtained 
using various sample sizes from a defined population. 

Method 

Fovix itea analysis statistics were examined in this study: 
(a) coefficient al|Aa (equ.ivalently^ the Kuder-Richardson Formula 
20), (b) the itea difficulty index (the proportion of examinees 
responding correctly to the test itea) , (c) the item 
discrimination index (the difference between the item difficulty 
index for the top 27% of the exaainees and the item difficulty 
index for the bottom 27% of the examinees), and (d) the item- 
total point-biserial correlation coefficient (the correlation 

7 



Item Analysis 
7 

between perforsance on the test ite» and total test score, 
corrected for overlap). 

This research was con'lucted by drawing randos samples from 
existing archival examination data files. These files represent 
pseudo-populations whose parameters are directly calculable r and 
from which the sampling bias and errors of statistics are 
eiqpirically estimable. Data files from ten teacher subject area 
examinations were used. The tests used in this research are 
listed in Table 1. 



Insert Table 1 about here 



From each pseudo-population, small random samples were drawn 
with replacement. One thousand samples each of sizes 5, 10, 20, 
40, 80, and 160 records were drawn from each of the ten pseudo- 
populations. In each sample, the indices of item difficulty and 
discrimination and the item-total point-biserial correlation were 
computed for each item. In addition, the value of coefficient 
alpha for the test was comi«ited for each sample of examinees. 

The behavior of each statistic was evaluated by (a) 
computing the standard error of the statistic for each sample 
size and each pseudo-population, and (b) computing the 
statistical bias of the statistic for each sample size and each 
pseudo-population . 

The standard error of each statistic was computed as the 
standard deviation of the saaqple estimates of the statistic about 
ERIC o 



ERIC 



Item Analysis 
8 

the mean value of the statistic: 

A A 

SEej « {CS(©ij - e.j)*]/(N-i))^ 

where 

SEej *» standard error of the item statistic in samples of 
size 

e^j n value of the statistic computed from sample i of 
size j, 

6.j « mean value of the statistic in the 1000 samples of 
size j. 

The bias of each statistic was computed as the difference 
between the mean value of the statistic in the 1000 samples and 
the value of the statistic in the pseudo-population: 

A 

BiasQj " e.j - 8 
where 

Biasej a statistical bias in the estimation of the 
statistic in samples of size j, 

A 

®'j = mean value of the statistic in the 1000 samples of 
size j, 

® « value of the statistic in the pseudo-population. 



u 



Item Analysis 
9 

All prograa code for the randon san^ling and statistical 
coBputatlons was written in SAS, Version 6.06. 

Results 

Because the results of the statistical bias analyses and 
standard error analyses were quite consistent across the ten 
subject area tests examined in this research, and to conserve 
space, detailed results are presented for only one subject area. 
Additional detailed results are available fros the authors. 
Tndav of iteB Difficulty 

sti^tiiatical Bias . Little statistical bias is evident in the 
estimation of the item difficulty index, even with samples as 
small as size 5. Box-and-whisker plots of item-level statistical 
bias for each sample size included in the study are presented in 
Figure 1. To construct this plot, the difference between the 
average p-value of each item (computed across the 1000 samples) 
and the p-value calculated from all examinees in the pseudo- 
population was computed. The plot provides the distribution of 
these differences for each test item on the test form. As is 
evident in Figure 1, the expected value of this difference is 
nearly zero for^each sample size examined. Although the 
variability of the individual item biases decreases with 
increasing sample size, even with sai^les of size 5, the item 
biases range from only -0.02 to 0.015. 



ERIC 



A) 



Item Analysis 
10 



Insert Figure 1 about here 



To further explore statistical bias in the estiiMition of 
iten p-values, iteas were grouped according to the aagnitude of 
the p-value obtained fro« the pseudo-population. The average 
bias within each group for each sai^^le size was computed. The 
results of this analysis are presented as Table 2. No systematic 
relationship between the p-value of the item in the pseudo- 
population and the degree of statistical bias is evident in this 
table. Most importantly, in all categories, the magnitude of 
bias is negligible. 



Insert Table 2 about here 



Stan<^ftffd Error- as expected, the standard error of the item 
difficulty index is related to the value of the item difi-'iculty 
(being largest at p^O.5). Table 3 presents, for each sample size 
examined^ the average standard error for items grouped according 
to item difficulty. The relationship between the magnitude of 
the item difficulty index and its standard error is evident in 
this table, in samples of size 5, the average standard error of 
items ranging from p«.40 to p«.59 is 0.22, while items with p>.90 
present an average standard error of less than O.io and items 
with p<.10 present an average standard error of 0.13. Note that 



Item Analysis 
11 



the standard error curve flattens vith larger sample sizes (this 
effect is best seen in the gras^ic presentation of these data in 
Figure 2). With samples of size 20, the standard errors ranged 
from 0.04 to 0.11. At H«40, the average standard error for itei» 
in the middle of the range is less than 0.08, irtiile the average 
standard errors for the extreme values of p are between 0.03 and 
0.04. Finally, at N«160, the average standard error ranges only 
from 0.01 to 0.04. 



Insert Table 3 & Figure 2 about here 



^pdex of Item Discrimination 

Statistical Bias . In contrast to the results obtained with 
the item difficulty index, substantial statistical bias is 
evident in the estimation of the item discrimination index when 
smrll samples of examinees are used. Box-and-whisker plots of 
item-level statistical bias for each sample size included in the 
study are presented in Figure 3. As with the plots of the item 
difficulty index, this plot presents the distribution of 
differences between the average D-value of each item (coi^uted 
across the 1000 samples) and the D-value for the item computed 
from the pseudo-population. As is evident in Figure 3 the 
expected value of this difference is substantially less than zero 
for samples of size 5 and 10. The average bias in estimating the 
item discriwination index is approximately -0.1 for samples of 



Item Analysis 
12 

size 5. The aidale fifty percent of the iteas on the examination 
form present biases ranging from "-0.04 to -0.15. For samples of 
size 10, the average bias in the estimation of item 
discrimination is reduced to -0.04, and the middle fifty percent 
of the items present biases ranging from -0.02 to -0.06. with / 
samples of size 20 or larger, the average bias is reduced to a 
negligible level, a result «rhich was consistent across the ten 
examination forms included in this study. 



Insert Figure 3 about here 



To further explore the statistical bias in the estimation of 
item D-values, the test items were grouped according to the 
magnitude of the D-value obtained from the pseudo-populations. 
The average bias within each group for each sample size was 
computed. The results of this analysis are presented in Table 4 
and Figure 4. 



Insert Table 4 & Figure 4 about here 



The degree of statistical bias in the estimation of the item 
discrimination index is proportional to the population value of 
the discrimination index. With scusples of size 5, the average 
bias for items with discrimination indices less than 0.10 is 
-0.016. In contrast, the average statistical bias for items with 



o 

ERIC 



i3 



Itea Analysis 
13 



discrlaination indices between 0.30 and 0.39 is -C.117, while 
highly discriainating iteas (0.70 to 0.79) present an average 
bias of -0.242. The grai^ of statistical bias by population 
value of the index shows a nearly linear relationship between the 
extent of the bias and the value of the statistic in the pseudo*- 
population. The negative bias in the estination is substantially 
reduced in sasples of size 20 or larger. 

standard Error . The average standard error of the item 
discrimination index for each sasple size examined is presented 
in Table 5. The standard errors for the itea discrimination 
index are notably larger than the errors evident for the indices 
of item difficulty. In samples of si2e 5, the average standard 
error of items ranging from D values of 0.30 to 0.39 is 0.45, 
while items with D<.10 present an average standard error of 0.27. 
At N»20, the standard errors of the item discrimination index 
range from 0.15 to 0.28, and only at sample sizes of 160 do the 
standard errors across the range fall below 0.10. Standard error 
curves for each sample size are presented in Figure 5. 



Insert Table 5 & Figure 5 about here 



Iten-Total goint Biaerial Correlation 

Statistical Bias . Th-^ lysis of statistical bias in the 
estimation of the item point biseriai correlation yielded similar 
results to the analysis of the bias in the estimation of the item 



Item Analysis 
14 



discrimination index, although the awgnitude of the ssall sample 
statistical bias is substantially reduced. Box-and-whisker plots 
of ite»-level statistical bias for each saaple size included in 
the study ar-^j presented in Figure 6. As with the previous 
presentations, this plot presents the distribution of differences 
between the avsrage value of the point biserial correlation for 
each ite» (computed across the 1000 samples) and the value of the 
point biserial correlation computed from the pseudo-population. 
The small sample bias is evident in Figiire 6, as the expected 
value of this difference is substantially less than zero for 
samples of si£^ 5 and 10. The average bias in estimating the 
item-total point biserial correlation is approximately -0.06 for 
samples of size 5. The middle fifty percent of the items on the 
exaaination form present blc*5es ranging from -0.02 to -0.08. For 
samples of size 10, the average bias in the estimation of the 

♦ 

item-total point biserial correlation is reduced to -0.02, and 
the middle fifty percent of the items present biases ranging from 
zero to -0.03. With samples of size 20 or larger, the average 
bias is reduced to a negligible level. 



Insert Figure 6 here about here 



To further explore statistical bias in the estimation of the 
itaa-total point biserial correlation, items were grouped 
according to the magnitude of the point biserial correlation 



Ztea Analysis 
15 



obtained from the pseudo-population. The average statistical 
bias within each grojp for each sasple size was computed. The 
results of this analysis are presented in Table 6 and Figure 7. 
As with the estimation of the item discrimination index, the 
degree of statistical bias is related to the population point 
biserial correlation. With samples of size 5, the average bias 
for items with discrimination indices less than 0.10 is -0.02. In 
contrast, the average bias for items with discrimination indices 
between 0.20 and 0.29 is -0.058, while highly discriminating 
items (0.40 to 0.49) present an average bias of -0.064. 



Insert Table 6 & Figure 7 about here 



Interestingly, the bias for items with values of the point 
biserial correlation between 0.50 smd 0.59 showed a slightly 
reduced level of bias (-0.056). The highest degree of 
statistical bias was obtained for items with point-biserial 
correlations between 0.30 and 0.39. The graph of statistical 
bias by population value of the index shows the u-shaped 
relationship b^atween the extent of the bias and the value of the 
statistic in the pseudo-population. This u-shaped relationship 
was found in six of the subject area tests examined in this 
study, while in the remaining four subject area tests, the 
relationship between degree of bias and the population value of 
the statistic was nearly linear. As with the estimation of the 



Item Analysis 
16 



item discrimination index, the negative bias in the estimation of 
the point biserial correlation was substantially reduced in 
samples of size 20 or larger. 

standard Error. The standard errors of the point biserial 
correlation for each sample size examined is presented in Table 
7. The standard errors for the point biserial correlation are 
similar in magnitude to those obtained for the item 
discrimination index. In samples of size 5, the average standard 
error of items ranging from rpj,ia-0.20 to rpbis«0.39 is 0.43, 
while items with rpj5is<0.10 present an average standard error of 
0.31. At N-20, the standard errors of the point biserial 
correlation range from 0.23 (for rpbig between O.IO and 0.19) to 
0.17 {for rpbis between 0.50 and 0.59). In general, across the 
ten examination included in this study, the standard errors for 
the point biserial correlation were smaller than those obtained 
for the discrimination index, although the difference was 
negligible. As with the index of item discrimination, all of the 
average standard errors for the point biserial correlation fall 
below 0.10 with samples of size 160, although at N«80, the 
standard errors are very close to this value. The standard error 
curves for each sample size are presented in Figure 8. 



Insert Table 7 & Figure 8 about here 



XteB Analysis 
17 



ffoflff icient: Aloha 

«i.«i-i«tilcal Bias . T! 3 analysis of statistical bias in ths 
estimation of coefficient al|^a is presented in Table 8. This 
table presents the bias in estimation of alirtja for each test form 
and each sample size examined in this research. All of the 
sample estimates of alpha present negative bias, although with 
the large sample sizes, the extent of the statistical bias is 
trivial. 

Insert Table 8 about here 



With samples of size five, the magnitudes of bias ranged from 
-0.059 (test form 9) to -0.126 (tost form 3). Doubling the 
sample size to samples of size 10 reduced bias to the range of 
-0.019 (test forms 6 and 9) to -0.091 (test form 3). With 
samples of size 20, the statistical bias in the estimation of 
coefficient alpha was less than -0.05 for all ten test forms 
examined, and for samples of size 40 the bias was less than 
-0,025. 

g»a»Hat^ Error . The standard errors of alpha, estimated for 
each of the ten examinations, are presented in Table 9. The 
average standard error ranged from 0.02, for samples of size 160, 
to 0.18 for samples of size 5. For samples of size 20, the 
average standard error of coefficient alpha was 0.07 and only one 
of the ten examinations showed a standard error greater than 



18 



Item Analysis 
18 



0.10. Dropping to saapl^s of size 10 increased the average 
standard error to 0.12, and seven of the ten examinations showed 
standard errors greater than 0.10. Box-and-i^isker plots of the 
distributions of the sai^ples of coefficient alpha are presented 
in Figure 9. ^ ' 



Insert Table 9 & Figure 9 about here 



Discussion 

Of the statistics examined in this research # only the index 
of item difficulty provided unbiased estimates of the population 
value across the breadth of sample sizes examined. However, the 
biases evidenced in the item discriainacion index, the item-total 
point-biserial correlation coefficient, and coefficient alpha 
were substantially reduce with samples of size 20 or larger. 
The negative biases obtained for the ite» di&crimiD?ir.ion index 
and the item-total point-biserial correlation coefficient were 
related to the population magnitudes of the statiistics, with 
greater degrees of statistical bias belr^ associ&ced witih more 
discriminating items. Of the two statistit^s, the item-tocal 
point-biserial correlation was the superior pii^rfcrmer witi:* small 
samples, showing about half the degree of the bias as th-^ item 
discrimination index. The standard errors for these tw^ 
statistics were comparable, but both showed substantially larger 
standard errors than those obtained for the item difficulty 



lt6B Analysis 
19 

index. 

The practical i^lications of tbese results are twofold. 
First, the results support the use of the saaiple estinates of the 
ite« difficulty index evan with snail saaples, provided that the 
standard errors are considered in their interpretations. 
Fortunately, the standard errors are considerably reduced at the 
extreae values of ite» difficulty. In pilot testing operations, 
items with extremely high or low values of difficulty are likely 
to be flagged for further examination, possible deletion or 
modification. Tho availability of greater precision of 
estimation at the extreme values increases confidence in data 
support for such decisions. 

Secondly, the statistical bias that is evident in the sample 
estimates of the item discrimination index and the point- 
biserial correlation coefficient suggest greater caution in their 
interpretation i^en the number of examinees is small. In 
addition, the standard errors of these estimates are so large 
that a conservative interpretation (i.e., using a two standard 
error confidence band) renders the estimates virtually useless 
because of their lack of precision. 

The performance problems evidenced in small samples by both 
the point biserial correlation and the discrimination index 
suggest the need for an alternative index of discrimination. 
Because the point biserial correlation is statistically related 
to the independent-means t-test (Kendall & Stewart, 1973), 



Item Analysis 
20 



coefficients leased upon nonparaaetric alternatives to the t-test 
Buiy provide indices that are unbiased in snail samples and that 
are more statistically efficient than the usual indices. 
Statistics such as the rank biserial correlation (Glass, 1966: 
Cureton, 1968) , or those used for nonparametric effect size 
estimations (Hedges 6 Olkin, 1984) should be explored for such 
applications. 

The stability of the results obtained in this research 
across the ten examination forms provides evidence of the 
generalizability of the results. Unfortunately, the use of 
archival data files from operational tests as the populations 
from which samples were drawn imposed lower limits on the 
technical quality of the test items examined. For example, 
negatively discriminating items were almost entirely absent from 
the data files, such items having been eliminated or corrected 
during the test developa»nt process. Similarly, tests with 
marginal values of internal consistency (als^as of 0.5 or 0.6) 
were not available. Further research on traditional item 
analysis statistics, using test forms providing a greater range 
of values for these statistics, is needed to extend these results 
across the breadth of population values of the indices. 



Item Analysis 
21 



References ^ 

Conrad, s. H. (1948). Characteristics and uses of itea-analysis 
data. PsYcholoaic!«i ffemOTflphg r £2# 1-48. 

Crocker, 1. & Algina, J. (1986). Introductiona to g>la«flicat 
MKl.llOtfgrn tgat ttlgory. New York: Holt, Rinehart, and 
ffinston. Inc. 

Cureton, E. B. (1968). Rank-biserial correlation idicm ties are 
present. Educational and Psychological Keaaurei^t^t:. 2&t 77-79. 

Haableton, R. K. (1989). Principles and selected applications 
of item response theory. In R. L. Lien (Ed.), SduiSfttifln&I 
aeaatiranent (pp. 147-220). Ifashington, DC: Aaerican Council 
Education. 

Glass, G. V. (1966). Note on rank biserial correlation. 

Educational and PsychQloqical Mflagwrfiaent, 2&, 623-631. 

Hedges, L. V., & Olkin, I. (1984). Nonparaaetric estiiaators of 
effect size in meta-analysis. PsygboXogjcaX Bttllfttin. 96-3 , 
73-380. 

Henryson, S. (1971). Gathering, analyzing, and using data on 

test items. In R. L. Thorndike (Ed.), Educational «aaaur«»iant 
(2nd ed,). Washington, Dc: Council on Education. 

Kendall, M. G., & Stewart, A. (1973). Advanced ttlCQgy Of 
Statistic^ . New York, N.Y.: Hasner Publishing Co. 

Lord, F. M., & Hovick, M. R. (1968). Statistical theori^ of 
mental teat ^corea. Reading, MA: Addison-ffesley. 

Nevo, B. (1980). Item analysis with small samples. A pplied 
Psychologi^pal Mea surement . A, 323-329. 

Nunnally, J. C. (1967). PgycHonetriC theory. New York: McGraw- 
Hill. 

Perry, N. C, & Michael, ». B. (1954). A tabulation of the 
fiducial limits for the point-biserial correlation 
coefficient. EdttcationaX and Psycholoqigal MftagHrgaffint# XA, 

715-721. 

Swineford, P. (1974). XHg tftSt gonsttltant aanual. Princeton, 
NJ: Educational Testing Service. 



Item Analysis 
22 

Table 1 

FJXX subject Area Tests Used as Pseudb-pop^tions 



FTCE 


N of 


H of 


Total 


Score 


&3biect Area Test 


Itesis 


ExzBBinees 


m 


SD 




115 


325 


80.69 


13.05 


Elementeury (1-6) 


140 


6405 


93.47 


12.88 


aaotionally Handicapped 


117 


434 


95.97 


8.27 


Qiglish (6-9) 


83 


594 


62.57 


8.24 


Guidance 


119 


490 


88.31 


9.69 


Hatheanatics (6-9) 


96 


635 


55.49 


14.33 


Physical Education 




390 


76.64 


11.90 


Early Childhood (K-3) 


141 


1326 


98.02 


12.09 


rxrial Studies 


157 


578 


116.37 


17.85 


^jecif ic Learning Disabilities 


118 


640 


86.21 


9.73 



o 

ERIC 



^3 



0.015 



0.01» 



-0.0075 



-0.0175 



F{«ur« 1 

0»»trfhutf«t of the SmUtfcl 8{«ms in tht Estfrntf 
«f the Itm DIfffeytty indM for $fx Smplm $f 
EMnfnatfott Fomi: 1 



on 



izes 



Item Analysis 
23 



0.01 



0.0075 



0.005 



0.0025 



■0.0025 



•0.005 



0 
0 



-0.01 



•0.0125 



•0.015 



0 
0 

0 
0 
0 



•0.02 ■ 

Htm 



Item Analysis 
24 

TebU 2 

SanpUng Sfss in tht £$t{»Btion of t(i« itm Difficulty Index 
for Six %mpl* SisM 
Test Fora: 1 





»m£ BIAS 




SIZE 


5 


10 


20 


40 


80 


160 


ITiH Olff icw.n 

<.to 


0.00& 


•O.MI 


0.001 


-0.(88) 


•0.001 


O.m 


.10-.19 


*0.009 


•0.000 


0.002 


0.002 


-0.000 


•0.001 


.20-. 29 


0.001 


0.002 


•0.001 


•0.002 


•0.000 


-0.000 


.30-. 59 


0.CKK5 


•0.004 


•0.000 


-0.002 


•0.(HN) 


0.001 


.♦0-.49 


O.OM 


•0.(K}0 


0.000 


•0.P01 


-O.MO 


0.001 


.50- .S9 


•0.004 


•0.«1 


-0.001 


•0.002 


•0.000 


0.«I1 


.60- .69 


0.000 


-o.oos 


-0.001 


•0.001 


'O.OQQ 


0.001 


.70-. 79 


-0.(H» 


•0.003 


0.000 


•0.002 


0.001 


0.(»1 


.80- .89 


o.wt 


-0.002 


o.«» 


•9*900 


•0.000 


0.(KK} 


.90-1.00 


-O.WI 


•0.000 


•0.900 


•o.m 


0.000 


0.000 



Itea Analysis 
25 

Table S 

Standard Error of th« Cstfmtc of Che ItM Offf (cutty Index 
for tlx Saqpic Sizes 

m 

Test twmt 1 





STAIOMffi ElKOft 


SIZE 


S 


10 


20 


40 


80 


160 


ITCH OIFFIO&TY 
<AQ 


0.154 


0.089 


0.065 


0.045 


o.on 


0.023 


.10-. 19 


0.15S 


0.112 


0.080 


0.055 


0.040 


0.028 


.20- .29 


0.194 


0.143 


0.101 


O.069 


0.049 


0.035 


.30-. 39 


0.212 


0.1S3 


0.107 


0.075 


0.054 


0.038 


.40*. 49 


0.220 


0.157 


0.111 


0.077 


0.055 


0.039 


.50- .59 


0.222 


0.157 


0.112 


0.070 


0.055 


0.040 


.fi0-.69 


0.214 


0.153 


0.107 


0.076 


0.054 


0.038 


.70-. 7? 


0.196 


0.140 


o.(m 


O.070 


0.049 


0.035 


.S0-.89 


0.167 


0.120 




0.059 


0.042 


0.030 


.90-1.00 


0.092 


0.065 


0.046 


0.032 


0.023 


0.016 



Figure 2 

Standord Errors of Item Difficulty Index 





0 250 




0 225 




0.200 




0 !75 






O 










0 t50 


liJ 








V. 


0 125 


O 


O 




:on 


0.100 








0.075 




0.050 




0.025 




0 000 



ERIC 



4 i 




• 


1 


DO 


if* S 

M* 10 




•»* 30 


1 


40 




N* 00 




M«t69 




<.I0 



-10~,19 .20-.29 .30-.39 .40-,49 .50-.59 .60-.69 .70-.79 .80-.89 90-1 00 

Population Value of the Itenn Difficulty Index 



M 
ft 

s 

I 

0) 

»-*♦ 
n 



Offtrihution of th« Statistical Sfase* In the Ectlastlon 
o# the Itoi Ol«cri»lnatl«o Indsx fer Six SM^le Sizes 
Exmlnatlon Fof«: 1 



Item Analysis 
27 



Item Analysis 
28 

Safspiing gist in the Esttmstion of the Iteoi Oiscrimfnatian tndex 

far Six SMpie Sixes 
Test fom: 1 





SAKPlt 8! AS 


SIZE 


5 


10 


20 


40 


80 


160 


ITEX 

OtSOtlMIIIATION 
«.10 


-0.016 


-0.004 


•0.CW1 


'O.m 


♦0.W1 


•O.WI 


.10-. 19 


-O.051 


-C.023 


•0,008 


-0.011 


-o.on 


-0.tK)7 


.20-.29 


-0.091 


•0.03S 


-0.011 


•0.016 


-o.ou 


-0.010 


.30- .39 


-0.117 


•0.053 


•0.012 


-0.(G0 


•0.016 


-0.014 


.40-. 49 


-0.152 


•0.075 


•0.032 


•0.033 


•0.027 


•0.021 


.50-. 59 


-0.180 


-0-078 


-0.022 


-0.026 


-0.818 


•0.012 


.60-. 69 


•0.219 


•0.119 


•0.039 


•0.055 


-0.027 


•0.018 


.70-.79 


-0-242 


-0.100 


•0.039 


•0.041 


-0.028 


•0.018 



0000 



-0.025 



0 050 



-0.075 

in 

O 

03 -0 100 

o 

O -0 125 



o 

in 



-0150 



-0 250 



ERIC 



31 



Figure 4 

Bios in the Estimotion of the Item Discrimination Index 




-0 t75 












o 


N= 5 






o 


N = 10 


-0 200 




& 


N = 20 






V 


N = 40 






O 


NaSO 


-0 225 




• 


N =160 



^«C> .10-.19 .20- 29 .30-.39 .40-.49 .50- 59 60-.59 .70-.79 

Population Value of the Item Discrimination Index 



> 

W t-" 
M> Pi 



32 



i 



St«idard Error of :h# Estimate of the Item OiscriBination Index 
for Stx SiMpie Sfzts 
Test for*! 1 



Iten Analysis 
30 





STAIffiAaO ESROIt 


SIZE 


5 


to 


20 


40 


80 


160 


DISaiNIIIATIOK 

<.to 


0.273 


0.197 


0.155 


0.108 


0.077 


0.052 


.10-. 19 


0.336 


0.292 


0.215 


0.143 


o.ora 


0.074 


.20- .29 


0.434 


0.349 


0.268 


0.182 


0.130 


0.093 


.30- .39 


0.447 


0.351 


0.272 


0.184 


0.130 


0.094 


-40- .*9 


0.426 


0.359 


0.277 


0.185 


0.132 


0.094 


.50-. 59 


0.412 


0.333 


0.266 


0.175 


0.128 


0.090 


.60-. 69 




0.301 


0.237 


0.162 


0-119 


0.087 


.70-.79 


• 


• 


0.222 


0.151 


9.108 


0.077 



1 



Figure 5 

Stondord Errors of Item DiscriminQtion Index 




<50 .10- 19 .20- 29 30-.39 .40-.49 .50-.59 .60-.69 .70-.79 

Population Value of the Item Discriminotion Index 



H 
ft 



ERIC 



34 



H 



Item /inalysis 
32 



Ofstribution of th« Ststtftlcal BfasM in the Estfmtion 
of the Iteo-Total Pefnt-8<«fff«l CorrtUtfon for Sfx SMpt« SIzet 

Examination form: 1 



0.02 



■0.02 



•0.04 



•0.06 



-0.08 



-0.1 



-0.12 



•O.U 



•0.16 



-0.18 ■■ 



•0.2 



•0.22 . 



-0.24 
Size 



0 
0 
0 
0 



0 
0 
0 

0 
0 



19 



0 
0 
0 
0 
0 



• 



0 
0 



20 



40 



9 

f 



80 



160 



Item Analysis 
33 

Tsblt 6 

Sampling Bias in the Estimation 
of the Item-Total foir.t*iiser{ai CorreUtion Jmfex 
for Six Sdfiple Sim 
Test Fom: 1 





SAm£ BIAS 


SI2E 


5 


10 


20 


40 


80 


160 


POINT SISERIAU 

*.ta 


•0.020 


•0.010 


•0.008 


-0.M5 


•0.004 




.to-. 19 


-0.051 


-0.025 


-0.019 


-0.011 


-0.006 


-0.002 


.20-. 29 


•0.058 


•O.QJO 


-0.017 


-0.009 


•0.004 


-0.002 


.30- .39 


•0.080 


•0.0S7 


-0.021 


-0.010 


-0.004 


•0,002 


.40- .49 


•0.064 


•0.028 


-0.014 


-0.002 


•0.000 


-0.002 


-50-. 59 


-0.056 


-0.M9 


•0.003 


0.001 


0,001 1 


0.000 



37 



Figure 7 

Bias in the Estimation of the Point-Biseriol Correlation 



U0040 




0.0752 



-OOS40 



ERIC 



3H 



<«0 .10-19 .20~.29 .30-. 39 .40-. 49 .50-.59 

Population Value of the Point-Biserial Correlation 



M 
ft 

> 

•< 
C9 
H- 
M 



3y 



Item Analysis 
35 

7^1 e 7 

Standard Error of the Eitimata 
of tho ItewTotal Point- Siserfai Correlation 
for Six Sample Situs 
Taat fona: 1 







SIZE 


5 


10 


20 


40 


80 


160 


raiNT SISERIAL 
<.10 


0.305 


0.235 


0.186 


0.141 


o.iro 


0.070 


.10-. 19 


0.407 


0,301 


0.229 


0.168 


0.118 


0.083 


.20- .29 


0.428 


0.310 


9.222 


0.158 


0.112 


o.oao 


.30- .39 


0.427 


0.303 


0.215 


0.146 


0.103 


o.on 


.40- .*9 


QAQS 


0.278 


0.19S 


0.134 


0.095 


0.067 


.S0-.59 


0.379 


0.239 


0.165 


0.112 


0.080 


0.956 



Figure 8 

Standard Errors of Point-Biserial Correlation 




0.00 « J J 1 i 1 I 

<I0 .to-. 19 .20-.29 .30-.39 .40-.49 .SO-.S'? 

Population Value of the Point-Biserial Correlation 



41 



i 



Itea Analysis 

37 



Tai>t« 8 

esUmatfon of Cwfficitftt Atph« fy Tett Fona and ianplt Sfxe 



S«npte 








T*st Form 


Size 


t 


2 


3 




5 


6 


7 


s 


9 . 


10 


5 


•0.070 


-0.096 


-0.126 


•o.toi 


•0.106 


-0.063 


-O.09O 


•0.102 


•0.059 


-0.090 


!0 


•0.022 


-O.0S3 


-0.W1 


•0.061 


-0.067 


-0.019 


'0.046 


-0.063 


-0.019 


-0.061 


20 


-0.012 


-0.028 


-0.046 


-0.035 


-0.056 


-o.«» 


-O.02O 


-O.032 


•O-01O 


-0.025 


40 


•O.iKS 


-0.01O 


-0.023 


-0.016 


-0.019 


-0.006 


-O.006 


-O.012 


-O.KK 


-0.014 


SO 


-0.(»2 


•0.008 


-0.013 


-0.008 


-o.ooa 


-0.«>2 


•0.005 


-0.007 


-0.001 


•0.006 


160 


•0.001 


-0.002 


-0.005 


-O.OOS 


-0.006 


O.IKtO 


-o.jsa 


-O.002 


-0.001 


-0.003 



13 



Item Analysis 
38 



Tsbit 9 

Standard €rrof» of Ce«fficfent Aipha for six Sw^le Sixat 



S«Rpte 








f<nx Form 


Sfz« 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


5 


0.157 


0.184 


0.217 


0.207 


0.203 


0.142 


0.187 


0.197 


8.138 


0.196 


10 


0.064 


0.129 


0.169 


0.146 


0.1SO 


0.0S8 


0.124 


0.154 


0.0S3 


0,139 


20 


0.0«3 


0.07S 


0.107 


0.097 


0.099 


0.03S 


0.075 


0.091 


0.030 


0.081 


40 


0.02S 


0.042 


0.074 


0.057 


0.062 


0.023 


0.042 


0.053 


0.018 


0.054 


80 


8.018 


0.032 


0.048 


0.038 


0.04O 


0.015 


0.029 


0.(136 


0,012 


0.03S 


160 


0.011 


0.021 


0.034 


0.026 


0.029 


0.011 


0.020 


0.025 


0.008 


0.025 



14 



Item Analysis 
39 



ffgurt 9 

Distribution of Sample Estinstes of Coefficient Alpha for Six Sanpie Sizes 

£xaminstim Fern: t 



1 " 



0.9 



0.8 



0.7 



0.6 - 



0.5 •■ 



0.4 . 



0.3 ■• 



0 
0 
0 
0 
0 
0 
0 



0 
0 
0 
0 
0 



CO 



0 
0- 
0 



0 
0 



0 

4- 

0 
0 



0,2 ■ 



* 
* 



o.t ■■ 



0 + 
Size 



10 



H 

20 



40 



SO 



160 



